5 research outputs found
Geometry-Oblivious FMM for Compressing Dense SPD Matrices
We present GOFMM (geometry-oblivious FMM), a novel method that creates a
hierarchical low-rank approximation, "compression," of an arbitrary dense
symmetric positive definite (SPD) matrix. For many applications, GOFMM enables
an approximate matrix-vector multiplication in or even time,
where is the matrix size. Compression requires storage and work.
In general, our scheme belongs to the family of hierarchical matrix
approximation methods. In particular, it generalizes the fast multipole method
(FMM) to a purely algebraic setting by only requiring the ability to sample
matrix entries. Neither geometric information (i.e., point coordinates) nor
knowledge of how the matrix entries have been generated is required, thus the
term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme
for hierarchical matrix computations that reduces synchronization barriers. We
present results on the Intel Knights Landing and Haswell architectures, and on
the NVIDIA Pascal architecture for a variety of matrices.Comment: 13 pages, accepted by SC'1
Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs
Training deep neural networks consumes increasing computational resource
shares in many compute centers. Often, a brute force approach to obtain
hyperparameter values is employed. Our goal is (1) to enhance this by enabling
second-order optimization methods with fewer hyperparameters for large-scale
neural networks and (2) to perform a survey of the performance optimizers for
specific tasks to suggest users the best one for their problem. We introduce a
novel second-order optimization method that requires the effect of the Hessian
on a vector only and avoids the huge cost of explicitly setting up the Hessian
for large-scale networks.
We compare the proposed second-order method with two state-of-the-art
optimizers on five representative neural network problems, including regression
and very deep networks from computer vision or variational autoencoders. For
the largest setup, we efficiently parallelized the optimizers with Horovod and
applied it to a 8 GPU NVIDIA P100 (DGX-1) machine.Comment: Accepted to PPAM conferenc
Software for Exascale Computing : SPPEXA 2016-2019
This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest.